Skip to content

Fix corpus test round-trip failures and achieve Rust escape_debug parity#143

Merged
philhassey merged 11 commits intocedar-policy:mainfrom
strongdm:corpus-test-round-trip
Mar 18, 2026
Merged

Fix corpus test round-trip failures and achieve Rust escape_debug parity#143
philhassey merged 11 commits intocedar-policy:mainfrom
strongdm:corpus-test-round-trip

Conversation

@philhassey
Copy link
Copy Markdown
Contributor

Summary

This PR fixes all ~410 corpus round-trip test failures (Cedar text and JSON) and achieves 1:1 parity with Rust's char::escape_debug() / str::escape_debug() for string escaping.

Round-trip test infrastructure

  • Add Cedar text and JSON round-trip testing to the corpus test harness, validating that parse → marshal → parse produces equivalent results for every corpus policy

Cedar text escaping (EntityUID, String, Pattern)

  • EntityUID: MarshalCedar() now uses rust.EscapeString instead of strconv.Quote, fixing invalid Go-style escapes (\b, \a, \f, \v) that Cedar's parser rejects
  • Pattern: MarshalCedar() now uses rust.EscapeCharAll instead of strconv.Quote, fixing the same class of invalid escapes (had a TODO acknowledging this)
  • Pattern JSON: Fix marshal of empty-string literals and special characters in pattern components

Cedar text operator marshaling

  • Reserved keywords: canMarshalAsIdent() now rejects empty strings and reserved keywords (true, false, if, in, etc.), preventing invalid Cedar like context.true instead of context["true"]
  • Operator parenthesization: Fix parenthesization for all binary operators based on associativity:
    • Non-associative relation ops (==, !=, <, <=, >, >=, in, has, like, is) now parenthesize both operands at same precedence
    • Left-associative ops (+, -, *, &&, ||) now parenthesize their right operand at same precedence, preventing (a - b) - (c - d) from flattening to a - b - c - d

JSON round-trip fixes

  • Empty Set: Change Set field from arrayJSON to *arrayJSON so omitempty distinguishes "no Set" from "empty Set"
  • Empty Record: Change Record field from recordJSON to *recordJSON (same pattern as Set fix)
  • Record with extension calls: Change recordJSON from map[string]nodeJSON to map[string]*nodeJSON. Since nodeJSON.MarshalJSON() has a pointer receiver, Go's json.Marshal couldn't call it on map values, silently dropping extension calls (like duration(), datetime()) stored in the json:"-" field
  • Nil-vs-empty Record maps: Ensure NewRecord(nil) and NewRecord(RecordMap{}) serialize distinctly

Rust escape_debug 1:1 parity

  • Port Rust's is_printable() lookup tables from library/core/src/unicode/printable.rs verbatim, replacing Go's unicode.IsPrint() which disagrees at boundaries (U+00A0, U+00AD, U+FFFC, U+FFFD, etc.)
  • Port Rust's Grapheme_Extend detection as Mn + Me + Other_Grapheme_Extend, replacing incorrect Mn + Me + Mc. Spacing combining marks (Mc) like U+0903 are NOT in Grapheme_Extend; Other_Grapheme_Extend chars like U+FF9E/U+FF9F are now correctly detected
  • Implement str::escape_debug() first/continuation distinction: grapheme extend chars escaped as first char but passed through raw in continuation positions
  • Add EscapeCharAll() for Pattern marshaling, matching Rust Cedar's per-character char::escape_debug() with ESCAPE_ALL

Test coverage

  • All ~410 corpus round-trip failures resolved (Cedar text + JSON)
  • Exhaustive parity tests against Rust 1.93 reference output for 160+ codepoints
  • 100% code coverage across all packages
  • make test and make linters pass clean

Test plan

  • make test — all packages pass at 100% coverage
  • make linters — golangci-lint clean, no insufficient coverage
  • Corpus round-trip tests pass for all ~830 corpus test files
  • Exhaustive escape_debug parity validated against Rust 1.93.1

🤖 Generated with Claude Code

philhassey and others added 2 commits March 18, 2026 11:40
Test policy sets through cedar and JSON round-trips, and assert
entity map equality after JSON round-trip.

Signed-off-by: Phil Hassey <phil@strongdm.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change the Set field in nodeJSON from arrayJSON to *arrayJSON so that
omitempty distinguishes "no Set" from "empty Set", preventing empty
Cedar sets from marshaling as {} instead of {"Set":[]}.

In Record.UnmarshalJSON, return a zero-value Record when the input
map is empty, so that round-tripping through {"tags":{}} preserves
nil-map equality with entities that had no tags field originally.

Signed-off-by: Phil Hassey <phil@strongdm.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@philhassey philhassey force-pushed the corpus-test-round-trip branch from abafe5c to ac4b77e Compare March 18, 2026 21:16
philhassey and others added 2 commits March 18, 2026 15:22
In Pattern.MarshalJSON, always emit a Literal component when the
component is not a wildcard, even when the literal is "". Use
json.Marshal for the literal value to properly escape special
characters like null bytes.

In ParsePattern, when the input is an empty string, produce a single
empty-literal component so that "like \"\"" round-trips correctly
through JSON as [{"Literal":""}] instead of [].

Signed-off-by: Phil Hassey <phil@strongdm.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EntityUID.MarshalCedar() was delegating to String() which uses
strconv.Quote, producing Go-style escapes (\b, \a, \f, \v) that are
not valid in Cedar. Switch to rust.EscapeString which produces
Cedar-compatible escapes (\u{8}, \u{7}, \u{c}, \u{b}).

Leave String() unchanged for Go-idiomatic display/debugging.

Signed-off-by: Phil Hassey <phil@strongdm.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@philhassey philhassey force-pushed the corpus-test-round-trip branch 2 times, most recently from 87caffc to 0749e79 Compare March 18, 2026 21:27
philhassey and others added 7 commits March 18, 2026 15:34
Pattern.MarshalCedar() was using strconv.Quote which produces
Go-style escapes (\x00, \v) that are not valid in Cedar. Switch
to rust.EscapeString for Cedar-compatible escapes (\0, \u{b}).

Signed-off-by: Phil Hassey <phil@strongdm.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
canMarshalAsIdent("") returned true because the loop over an empty
string iterates zero times. It also accepted reserved keywords like
"true", "false", "if", "in", etc., producing invalid Cedar like
`context.true` instead of `context["true"]`.

Add an early return for empty strings and reserved keywords, using
the existing IsReservedKeyword helper.

Signed-off-by: Phil Hassey <phil@strongdm.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Binary operators were not parenthesizing children correctly based on
associativity:

- Non-associative relation ops (==, !=, <, <=, >, >=, in) and keyword
  ops (has, like, is, is-in) need to parenthesize both operands at the
  same precedence, since (a == b) == c is valid but a == b == c is not.

- Left-associative ops (+, -, *, &&, ||) need to parenthesize their
  right operand at the same precedence, since (a - b) - (c - d) must
  not be flattened to a - b - c - d which changes semantics.

Split marshalInfixBinaryOp to take separate left/right precedence
levels. Left-associative ops pass (p, p+1) and non-associative ops
pass (p+1, p+1).

Signed-off-by: Phil Hassey <phil@strongdm.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two issues with Record in nodeJSON:

1. Record field was a value type (recordJSON = map[string]nodeJSON)
   with omitempty, so empty records were silently dropped during
   JSON marshal, producing {} instead of {"Record":{}}.
   Fix: change to *recordJSON pointer (same pattern as the Set fix).

2. recordJSON mapped keys to nodeJSON values (not pointers). Since
   nodeJSON.MarshalJSON has a pointer receiver, Go's json.Marshal
   cannot call it on map values (can't take address of map element).
   This caused the default marshaler to be used, which skips the
   ExtensionCall field (tagged json:"-"), silently dropping extension
   calls like duration() and datetime() inside records.
   Fix: change recordJSON to map[string]*nodeJSON so the custom
   MarshalJSON is always invoked.

Signed-off-by: Phil Hassey <phil@strongdm.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the approximate EscapeString/ShouldEscape implementation with
an exact port of Rust's char::escape_debug() and str::escape_debug():

- Port Rust's is_printable() lookup tables from printable.rs verbatim,
  replacing Go's unicode.IsPrint() which disagrees at boundaries like
  U+00A0, U+00AD, U+FFFC, and U+FFFD.

- Port Rust's Grapheme_Extend property as Mn + Me + Other_Grapheme_Extend,
  replacing the incorrect Mn + Me + Mc check. Mc (spacing combining marks
  like U+0903) are NOT in Grapheme_Extend; Other_Grapheme_Extend chars
  like U+FF9E and U+FF9F now correctly detected.

- Implement str::escape_debug first/continuation distinction: grapheme
  extend chars are escaped as first char (ESCAPE_ALL) but passed through
  raw in continuation positions (CharEscapeDebugContinue).

- Add EscapeCharAll for Pattern marshaling, matching Rust Cedar's
  per-character char::escape_debug() with ESCAPE_ALL on every char.

Signed-off-by: Phil Hassey <phil@strongdm.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Validate escapeRune, EscapeString, EscapeCharAll, isPrintable, and
isGraphemeExtended against exact Rust 1.93 char::escape_debug() and
str::escape_debug() output for 160+ codepoints covering:

- All C0/C1 controls, full Latin-1 supplement boundary (0xA0-0xBF)
- Combining marks (Mn, Me, Mc), format chars, separators, BOM
- Halfwidth katakana, hangul jungseong, tag chars, variation selectors
- SMP and supplementary plane boundaries
- str first-vs-continuation grapheme extend distinction
- Round-trip correctness through Unquote for both escape modes

Signed-off-by: Phil Hassey <phil@strongdm.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each corpus generation target now uses a private mktemp -d with a
trap for cleanup on failure, eliminating shared /tmp/corpus-tests
collisions and leaked temp files. Quote basename arguments for
safety with special characters. Add extracted corpus directories
to .gitignore.

Signed-off-by: Phil Hassey <phil@strongdm.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@philhassey philhassey force-pushed the corpus-test-round-trip branch from 0749e79 to d627758 Compare March 18, 2026 21:34
@philhassey philhassey merged commit a4d0ae8 into cedar-policy:main Mar 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants